D O C *v’ M F ^ T 



R F S U M F 



TE 000 672 



ED 021 851 56 

By Cewroll, John B.; Cramer, K Leslie 

THE INTELLIGIBILITY OF TIME COMPRESSED SPEECH FINAL REPORT. 

Harvard Univ., Cambridge, Mass. Lab. for Research m Instructioa 
Spons Agency Of fice of Education (DHEW), Washington, D C. Bureau of ResearcK 
Bureau No'BR'5'0958 
, Pub Date Jol 68 
Contract- OEC' 7-31*0370-27 1 
Note* 36p. 

EDRS Price MF-$0.25 HC*$1.52 
Descriptors- AUDIO EQUIPMENT. BLiND. COMMUNICATION (THOUGHT TRANSFER). COMPREHIENSION. 
INFORMATION THEORY. INTONATION. ♦LANGUAGE RESEARCK LECTURE *LISTENING COMPREHENSION.. 
RESEARCH METHODOL(XY. SENTENCES *SPEECK *SPEECH COMPRESSION TAPE RECORDINGS T-IME 

Time “Compressed speech is now being used to present recorded lectures to 
groups at word rates up to two and one-half times that at which they were originally 
spoken. This process is particularly helpful to the blind This study investigated the 
intelligibility of speech processed with seven different discard intervals and at seven 
rates from two to five times the original. Optimal parameters for processing speech at 
each of the rates are reported along with a comparison of intelligibility rencdered by 
three different modes or presentation. Headphone presentation was found to be 
about 227 more intelligible on the average than presentation by loudspeaker. The 
commercial equipment available for processing time-compressed speech varies the 
sampling interval but maintains a constant discard interval. The findings indicate that 
the sampling interval should be held constant for rates two to five times normal. A 
sampling interval of 15 milliseconds was found to be optimal at all rates for a 
low-pitched man’s voice. A higher pitched woman’s voice, would require an even shorter 
sampling interval. Processing and presenting time-compressed speech using the 
optimal parameters reported here should promote greater comprehension as well as 
intelligibility at all rates, intelligibility being a necessary prerequisite for comprehension. 
(Author) 



T 






5 



FINAL REPORT 

Contract No, OB~7-3I-0370-271 



THE IiiTSLLIGIBILITY OF TIME-COIHIESSED SPEECH 



X 




Jlil/ 1968 







fN. 


r—f 


'M) 






CO 


'O 




0 


rvl 


Q) 


0 




a 


iU 




U.S. DEPARTMENT OF 
HEALTH, EDUCATION, AND v.TlLFiVRS 

Office of Education 
Bureau of Research 







U.S. OEPARIMINI OF HFAIIH. EOUCAIION 4 WEIEARE 
OFFICE OF EOUCAIION 



IHIS OOCUMENI HAS BEEN REPRODUCED EXACIIY AS RECEIVED FROM IHE 
PERSON OR ORGANIZAIlOH ORIOIHAIINO II. POINIS OF VIEW OR OPINIONS 
STAIED DO NOI NECESSARIIY REPRESENI OFFICIAl OFFICE OF EOUCAIION 
POSIIION OR POLICY. 



Final Report 

Contract No, OE-7 -31-0370-271 



The Intelligibility of Tine-Gonipressed Speech 



John 3. Carroll 
and 

H. Leslie Cramer 



Laboratory for Research in Instruction 
Graduate School of Education 
Harvard University 

Cambridge, Massachusetts 02138 

July, 1968 



The research reported herein performed pursuant to a contract 
vri.th the Office of Education, U.S. Department of Health, Education, and 
Welfare. Contractors undertaking such pr objects under Government spon- 
sorship are encouraged to express freely their professional judgment in 
the conduct of the project. Points of vievi or opinions stated do not, 
therefore, necessarily represent official Office of education position 
or policy. 



U.S.- DSPARTMiNT OF • 
HEALTH, EDUCATION, AND WELFARE 



Office of Education 
Bureau of Research 



PREFACE 



The following report was originally given as a 
paper, the 18th of 20 presented at the Library of Congress 
Conference on Time Compressed Speech, held at the Uni- 
versity of Louisville, Kentucky, October 19-21, 1966. 

It was subsequently published as Chapter 18 (pgs. 126- ■ 

148) in Foulke, E. (ed.). The Proceedings of the Louis- 
ville Conference on Time Compressed Speech , May, 1967, 

Center for Rate Controlled Recordings, University of 
Louisville, Kentucky. Since it was presented toward the 
end of the Conference, the explanation of the process of 
time— compres s ion and definitions of terms was not included. 
The reader who is interested in this process is urged 
to refer to the complete Proceedings, which are available 
at $2.00 from the University of Louisville, or from 
ERIC. A much more detailed descrittion of the two studies 
contained in this report is available in Cramer, H.L. 

The Intelligibility of Time Compressed Speech, unpublished 
doctoral dissertation. Harvard Graduate School of Education, 
Cambridge, Mass., 1968. 



TABLE OF CONTENDS 



Page 

PREFACE 

LIST OF TABLES . . . . . • iii 

LIST OF ILLUSTRATIONS iv 

TIME COMPRESSION PROCESS 1 

INTELLIGIBILITY MEASUREMENT 1 

METHODS ^ 

RESULTS AND FINDINGS ' 5 

REFERENCES 26 

ERIC REPORT RESUME 31 

LIST OF TABLES 

Table 

1. Latin Square Design for Pilot Study ... 4 

2. Amouiit of Delay for Each of the 7 Conditions 

in the Pilot Study 4 

3. Analysis of Variance Table for Pilot Study 5 

♦ 

4. Latin Square Design fpr Main Study .... 12 

5. Discard Intervals for Tape Series A to C . 13 



LIST OF ILLUSTRATIONS 



Figure 

1. Block Diagram of Playback Apparatus for 

Pilot Study 

2. Pilot Study Intelligibility Plotted vs. 

Speech Rate with 7^ MS Inter-aural 
Time Difference and No Inter-aural 
Time Difference in Presentation . . . 

3. Number of Errors in Pilot Study with 

Seven Amounts of Delay at Four Times 
Normal Speech Rate 

4. Oscillograph Tracing of the Beginning 

of the Word "John" 

5. Oscillograph Tracing of Figure Number 

3-8 with the Same Tracing Offset on 
Itself by One Pitch Period 

6. Block Diagram of Audio Playback 

Apparatus for Main Study 

7. Intelligibility vs. Discard Interval^ 

for Stereophonic Mode of Presentation . . 

8. Intelligibility vs. Discard Interval 

for Monophonic Mode of Presentation . . . 

9. Intelligibility vs. Discard Interval 

for Loudspeaker Mode of Presentation . . 

10. Intelligibility vs. Sampling Interval 

for Stereophonic Mode of Presentation . 

11. Intelligibility vs. Sampling Interval 

for Monophonic Mode of Presentation . . . 

12.. Intelligibility vs. Sampling Interval 

for Loudspeaker Mode of Presentation . . 

13. Tempo Regulator Parameters Compared with 

Optimum Sampling Intervals , 

Main Study Intelligibility vs. Speech 
Rate for both Monophonic and Loud 
Speaker Presentation ... 



14 . 



LIST OF ILLUSTRA.TIONS (continued) 



Figure 



15. Intelligibility vs. Plotted Speech _ 

Rate for Optimum Conditions in Pilot 
Study with Delay and Main Study Mono- 
phonic and Loudspeaker Presentation . 



Page 



25 



-2 



V 




The Intelligibility Of Tlme^Conipressed Speech 



f 



T ime Compress I on Process 

If we make a- magnetic tape recording of speech at 7 .1/2 Inches per second . 
(ips.) and play It at 15 Ips., it can be heard In half of the original time In 
which it was recorded. All the frequencies double, however, giving a^high pitched, 
so-called "Donald Duck" effect. Speech so processed and played back is rather 
unpleasant and practically unintelligible. With the development of magnetic 
tape recording, it has become possible to halve play back time by another method 
without the attendant frequency shift of the speeded-up play back. This other^ 
method involves cutting a tape recording every one quarter of an inch from beginning 
to end, and discarding alternate pieces. The remaining pieces are then spliced 
together to make a reconstituted tape that is half the length of the tape as 
originally recorded. The play back of a tape processed by this chop-splice method, 
will sound normal however, as far as the pitch of the speaker's voice is concerned, 
although It will obviously sound as though words were spoken rapidly. 

Garvey and Henneman (1950) Investigated word intelligibility of speeded- 
speech- produced by this chop-splice method. Garvey (1953) reported that the 
intelligibility remains high, above 90 per cent, at a word per minute rate 2 and , 

1/2 times the original recording. Garvey further reports (1965) that after 
completing his thesis with the chop-splice method that he never wanted to see another 
tape or splicer; and it Is fortunate for researchers that Fairbanks, Everitt, and 
Jaeger (1953, 195^, 1959) developed an electro-mechanical system for automatically 
discarding segments of speech from a recorded tape. Their system uses four playback 
heads mounted in a rotating drum to scan a magnetically recorded tape. The effective 
time length of each speech sample scanned and retained Is equivalent to the speech 
time on each pJece of tape spliced together in the chop-splice method, and is called 
the sampling Interval. This sampling interval Is determined by the revolutions 
per minute of the rotating heed assembly. The time length of each speech sample 
eliminated is likewise 'slmi lar *to the speech time on each piece of tape discarded 
in the chop-splice method, and is called the discard Interval. The discard interval 
is determined by the speed of the magnetic recording tape around the rotating head 
assembly. The output of Fairbanks' system Is thus equivalent to the output produced 
by the chop-splice method. 

Intel 1 iqibi 1 i ty Measurement 

Previous studies dealing with the Intelligibility of time-compressed speech 
have used phonetically balanced spondaic-word lists (lists of two syllable 
with equal stress on each syllable, eg. horseshoe). (Fairbanks and Kodman 957)* 

A restricted list of fifty words Is presented after time has been allowed for the 
listener to familiarize himself with the words in the list, both by studying the 
written lists, and by hearing the list In order while reading them. The lists 



ERIC 



1 



are then presented many times in different randomized orders. There is no known 
previous investigation of the intelligibility of time-compressed vSrqrds in context. 

Fairbanks, Guttman, and Miron (1957) investigated comprehension of compressed 
speech by using 1500 word technical passages presented at high word-pe r-mi nute (wpm) 

^ rates ,* tested by multiple choice type questions. However, it was not clear which 
of three possible causes might be attributed to a wrong answer: (!) the subject 

did not hear every speech sound because the length of the discarded sample was so 
long that whole sounds were dropped; (2) the distortion introduced by the 
interruption frequency of the compression equipment was interfering with the signa.; 
or (3) there was a problem in perceiving at a rapid rate (i.e., difficulty in 
cognitive processing). Unfortunately, this work was done only with a discard 
interval of .02 seconds. The only speech compression apparatus commercially 
available is the Eltro Information Rate Changer manufactured by Telefonbau and 
Normalzei t , Frankfurt-am-Main, Germany. This equipment uses a .04 second discard 
interval at all compression ratios, and cannot be altered. Tests run to determine 
the comprehension of speeded speech by both blind pupils (Bixler, Fou 1 ke , Ams ter , 

& Nolan, 1961 , Foulke and Bixler, 19^3 S- 19^4, Foulke, Amster, Nolan, £■ Bixler, 

1962 ) and sighted pupils (Orr and Friedman, 1964, Friedman, Orr, Freedle, and 
Norris, I 966 , Friedman, Orr, and Norris, 1966, Voor, 1962, Wood, 1965) used this 
equipment for compressing their materials. 

Fairbanks and Kodman (1957) tested word i ntel 1 ig i b i 1 i ty- as a function 
of time compression. Tiieir curves suggest that the optimum rate of interruption 
and length o! discard interval is not a constant for all compression ratios. At 
compression rates up to 75^> ^ discard interval of .01 sec. appears best, at 80 
to 85 %, .06 sec., but at 90%, .(5 sec. When listening to connected discourse, a 
person has cues to the words from both the context and the grammar of the sentences 
(Miller, Heise and Lichten 1951, Goldman-Eis ler , 1958 and 1961, Pollack 1964, Miller 
1962 , and Savin 1963-) 

The research reported Here attempts to test the middle ground between 
word intelligibility of words in isolation and comprehension of long passages by 
testing the word intelligibility of short sentences. This research was done in 
two parts, the first being a pilot study to see if a difference of 15 to 30 
milliseconds (ms.) in the time of presentation of sentences to the two ears would 
improve the intelligibility of time-compressed speech. The second part was the 
main study and deals with the intelligibility of the same passages used in the 
pilot study. These were presented at seven different compression ratios, each 
processed with seven different discard intervals. The passages used were from 
the Harvard Psycho-Acoustic Laboratory (P.A L.) Auditory test no. 12, (Hudgins, 
et al., 1947 ) which consisted of seven passages of 28 questions each. The P.A.L. 

Test passages were preceded by the fol lowing introductory passage which was used 
to help motivate the students and also to allow a gradual increase of speech rate 

to approximately the starting rate of the lists. Here is the introductory passage 

and the first ten sentences. 

This is an experiment to determine the intelligibility of speeded 
speech. You will hear a passage, which has been specially processed,^ 
in less time than it took the reader to read it. Even^ these instructions 

to which you are now listening have been speeded by 20% as compared 

with the original The reader's voice sounds normal, however, rather 
than high-pitched, as it would if a phonograph record of it were being 
played at a higher speed than that at which it was recorded. Speech 
processed in this manner is already being used to present material to 



‘'Since the average length of words may vary , Carrol 1 (1964) proposes 
that the syl lable-per-minute rate would perhaps be a better measure, with the ^ 
speed of ?65 sy 1 lables-per-mi nute as the average rate of normal speech production. 



blind persons at rates up to 475 w.p.m. A normal speaking range 
varies from a low 125 w.p.m. to a rapid 200 w.p.m. or more. The 
blind high school student averages only 90 w.p.m. when reaolng braille 
while the average sighted high school student reads books at 200 
w.p.m. Some blind people using both hands simultaneously to read 
braille can reach reading speeds as high as 225 w.p.m. However, 
this is exceptional, and less than 10% of the blind learn to read 
braille ac all. This experiment is designed to test the intelligibil 
of compressed speech at rates up to 800 w.p.m., increasing by 
increments of 50 words per minute over a series of seven passages. |- 
is contemplated that the results of this study will lead to improved 
methods of presenting verbal material to the blind. It also seems 
that such rapid speech can be useful in presenting tape recorded 
reviews of lectures at rates so high that the 50 minute lecture can 
be heard in as little as 10 to 12 minutes. 



The following passages are from the Psycho-Acoustic Laboratory Test 
Number 12, published by Harvard University. 

List Number 1 

1. In what country is Chicago? 

2. What letter comes after Y? 

3. What is the color of grass? 

4. What number comes before 7? 

5. What part of the body do you write with? 

6. What comes out of a kitchen faucet? 

7. What number is between 8 and 10? 

8. Which is wetter, water or sand? 

9. Do you dig holes with a shovel or a rake? 



Methods 

For the pilot study this tape was presented by meatus or earphones to 
Radcliffe students, who heard it from seven different tapes, each in the same 
ascending order of compression, starting at 50% compression, (494 syllables pt 
minute or) 398 v/ords per minute, with a 1.24 sy 1 lable-to-word ratio. The sub. 
were screened to eliminate any who had more than 5 decibels hearing loss e 
ear or more than 3 decibels difference between ears using the Central Institu^ 
for the Deaf fCID) Auditory Test W-2. (Davis, ejt aj[. , 19o4, pg. 535) 

This pilot study wh»ch preceded the main study was performed to detei 
whether there was any difference in intelligibility when the speech was delay, 
one ear by seven different amounts including zero delay. Table 1 shows the d. 
for this study. Table 2 shows the amounts of delay which were obtained by ad. 
of a micrometer head. A playback head was attached to the micrometer so that 
head could be moved along the tape on which the passages were recorded, ihe i 



3 



TABLE 1 



Latin Square design for the Pilot Study showing distribution of passages from 
the P.A.L. Auditory Test No. 12. Each subject heard 7 passages as shown In 
each row, In ascending order of compression ratios. All passages were compressed 
at .035 sec. discard interval.^ 



<;nepd-UD Factor 2.0 


2.33 


2.67 1 


3.6 1 3.33 


3.67 


4.0 


3omoression Ratio 


50.00 


57.15 


62.40 


66.60 { 70.00 1 


72.75 


75.00 i 


$vl lables/mi n. 


494 


576 


658 


740 1 822 


905 


987 


A/ords/min. 


398 


464 


531 


597 1 663 1 


730 

■ 


796 


Tape Subjects 

lumbers Numbered 

1 01-07 


2 


3 




4 


5 


• 6 


7 


8 


2 08-14 


4 


8 


7 


2 


5 


3 


6 


3 15-21 


8 


2 


6 


3 


7 


5 


4 


4 22-28 


7 


6 


3 


4 


2 


8 


5 


5 29'35 


5 


7 


2 


6 


8 


4- 


3 


6 36-42 


3 


• 5 


Vy 


7 


4 


6 


2 




7 43-49 


6 


4 


5 


8 


3 


2 




Each Tape (numbered 1-7) was played 7 times. The first subject hearing 
had the same material presented at the same time in each ear. Each of 
sequent subjects to hear each tape had the material alternately de.aye 
ear (alternately right and left on every question). The seven amounts 
including zero delay, were as follows: 

TABLE 2 

Amount of delay for each of the 7 conditions in the Pilot Study 

1 r ■ 


each tape 
the six sub 
at one 
of , del ay , 




Delay Cond i lion No. 

S- Subjects Number 
for Table number 1 


1 


2 


3 


4 


5 


6 


7 


Delay in Milli- 
seconds 


0.0 


0.5 


1 .0 


4.0 


7.5 


15.0 


30.0 


Micrometer Head 
i no in 1 nches 


0 


.00375 


.00750 



.03000 


.05625 


.11250 


.22500 




n. or 247 syl lables/mi n. across 7 lists 
■ 4 


^Reader averaged 199 wordspi 
for a sy 1 labl e-to-word ratio of 1.24. 

o 

ERIC 



